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ABSTRACT 

Two-variable  numerical  functions  are  widely  used  in  vari¬ 
ous  applications,  such  as  computer  graphics  and  digital  sig¬ 
nal  processing.  Fast  and  compact  hardware  implementations 
are  required.  This  paper  introduces  the  bilinear  interpola¬ 
tion  method  to  produce  fast  and  compact  numerical  function 
generators  (NFGs)  for  two-variable  functions.  This  paper 
also  introduces  a  design  method  for  symmetric  two-variable 
functions.  This  method  can  reduce  the  memory  size  needed 
for  symmetric  functions  by  nearly  half  with  small  speed 
penalty.  Experimental  results  show  that  the  bilinear  inter¬ 
polation  method  can  significantly  reduce  the  memory  size 
needed  for  two-variable  functions,  and  the  speed  of  NFGs 
based  on  the  bilinear  method  is  comparable  to  that  of  NFGs 
based  on  tangent  plane  approximation.  For  a  complicated 
function,  our  NFG  is  faster  and  more  compact  than  a  circuit 
designed  using  a  one-variable  NFG. 

1.  INTRODUCTION 

The  availability  of  large  quantities  of  logic  in  LSIs  and 
the  need  for  high-speed  arithmetic  function  computation  in 
modern  data  processing  applications  have  created  a  unique 
research  opportunity  [7].  High-speed  arithmetic  func¬ 
tion  computation  will  almost  certainly  result  in  significant 
changes  in  the  way  engineers  perform  data  processing  tasks. 
For  example,  image  recognition  tasks  will  more  likely  be 
performed  at  the  site  of  the  image  collection,  such  as  on¬ 
board  reconnaissance  vehicles. 

High-speed  computation  of  one-variable  arithmetic  func¬ 
tions  (e.g.  sin(x)  and  log(x))  has  been  extensively  studied  [2, 
6, 8, 10-12].  However,  significantly  less  work  has  been  done 
on  the  high-speed  implementation  of  multi-variable  func¬ 
tions  (e.g.  \J x2  +  y 2  +  z 2  and  arctan(x/y))  [3,4,  13].  The 
existing  design  approaches  apply  a  different  method  for  each 
function  realized.  As  far  as  we  know,  no  systematic  design 
method  for  generic  multi-variable  functions  has  been  pre¬ 
sented. 

A  straightforward  design  method  for  an  arbitrary  multi- 
variable  function  is  to  use  a  single  memory  in  which  the 
address  is  a  combination  of  values  of  variables  and  the  con¬ 
tent  of  that  address  is  the  corresponding  value  of  function. 
This  method  is  fast,  but  requires  a  2"'" -word  memory  to  im¬ 
plement  an  m-variable  function  with  n  bits  for  each  variable. 
Thus,  unlike  one-variable  functions,  this  method  is  imprac¬ 
tical  even  for  low-precision  applications. 

To  produce  a  practical  implementation,  multi-variable 
functions  can  be  designed  using  a  combination  of  one- 
variable  function  generators,  multipliers,  and  adders  [3,4]. 


This  design  can  reduce  the  required  memory  size.  However, 
depending  on  the  function,  it  can  produce  a  slow  implemen¬ 
tation  because  of  its  complex  architecture.  Also,  complex 
architecture  makes  error  analysis  harder. 

This  paper  proposes  a  systematic  design  method  for  two- 
variable  functions.  Since  our  design  method  is  based  on  a 
piecewise  polynomial  approximation,  hardware  architecture 
is  simple  even  for  complex  functions.  However,  polynomial 
approximation  methods  tend  to  require  large  memory  size. 
For  multi-variable  functions,  using  higher-order  polynomi¬ 
als  is  not  always  effective  to  reduce  the  memory  size.  This  is 
because,  for  multi-variable  polynomials,  higher  polynomial 
order  requires  many  more  polynomial  coefficients.  Also, 
higher-order  polynomials  produce  slower  NFGs.  Thus,  for 
polynomial  approximation  methods,  reducing  memory  size 
with  a  small  speed  penalty  is  a  key  issue.  To  accomplish  this, 
this  paper  introduces  the  bilinear  interpolation  method.  This 
paper  also  introduces  a  design  method  and  an  architecture 
for  symmetric  two- variable  functions.  Error  analysis  for  our 
NFGs  is  omitted  because  it  is  almost  the  same  as  [11],  It  is 
guaranteed  that  the  maximum  error  of  our  fixed-point  NFGs 
is  smaller  than  2  (i.e.,  ///-bit  accuracy  NFGs),  where  m  is 

the  number  of  fractional  bits  for  the  inputs  and  the  output. 

2.  POLYNOMIAL  APPROXIMATION  USING 
BILINEAR  INTERPOLATION 

Bilinear  interpolation  is  an  extension  of  linear  interpola¬ 
tion.  It  interpolates  two-variable  functions  f(X,Y)  using 
four  points.  Let  the  four  points  be  (Xi,Fi),  (X] ,  Y2),  (X2,Yi), 
and  (X2,Y2),  and  let  /]  1  =  f(X\ ,  Y\),  fn  =  f(X  1 ,  Y2),  /21  = 
f(X2,Yi),  and  f22  =  f(X2,Y2).  Then,  the  bilinear  interpola¬ 
tion  g(X,  Y)  is  given  by: 


(y  y s  _  fn(X2-X)(Y2-Y)+f2i{X-Xi){Y2-Y) 

8{  ’  }  (X2-X1)(Y2-Y1) 

fn{X2-X)(Y -Yl)+f22(X-Xi)(Y -Yx) 
(X2-Xl)(Y2-Y1) 

By  expanding  and  rearranging  this,  we  obtain  the  following 
form:  g(X,  Y)  =  CxyXY  +  CxX  +  CyY  +  Co,  where 


r  _  /11  ~fn -/12+/22 
(X2-X1)(Y2-Yl)  ’ 
-fuY2+f2lY2+fnYl-f22Yl 
x  (X2-Xl)(Y2-Yl) 

-fuX2  +  f2\Xx  +  /12X2  -  fllXx 
(X2-Xi)(Y2-Y1) 


and 
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r  fnX2Y2  -fnXlYl-  fnX2Yi  +  /22X1  U 

°  (X2-Xl)(Y2-Yl) 

To  approximate  a  given  two-variable  function  using  bi¬ 
linear  interpolation,  we  first  partition  a  given  domain  of  the 
function  into  segments.  For  each  segment,  we  approximate 
the  function  using  bilinear  interpolation.  In  this  case,  the 
memory  size  and  speed  of  an  NFG  are  strongly  dependent 
on  the  efficiency  of  the  segmentation  algorithm.  Thus,  ef¬ 
fective  segmentation  algorithms  are  needed  to  achieve  fast 
and  compact  NFGs.  We  use  a  recursive  planar  segmenta¬ 
tion  algorithm  [9]  (based  on  bilinear  interpolation). 

The  algorithm  begins  by  computing  a  bilinear  interpola¬ 
tion  using  the  four  corner  points  of  the  given  domain.  This 
is  an  initial  approximation.  If  that  approximation  error  is 
larger  than  the  given  acceptable  error,  then  the  domain  is 
partitioned  into  four  equal-sized  square  segments.  For  each 
segment,  a  bilinear  interpolation  is  computed  using  the  four 
corner  points  of  the  segment.  The  same  process  is  recur¬ 
sively  repeated  until  all  segments  have  an  acceptable  ap¬ 
proximation  error.  Note  that  this  algorithm  creates  a  seg¬ 
ment  of  size  Wi  x  Wj,  where  w,-  =  2h‘  x  2~m'n,  mm  is  the 
number  of  fractional  bits  for  X  and  Y,  and  /?,  is  an  integer. 
That  is,  all  the  segmentation  points  P,  and  (9,  are  restricted 
to  values  such  that  the  least  significant  /?,-  bits  are  0  (i.e., 
Pi  =  (...  p-j+ 1  p-j  00  . . .  0)2,  where  j  =  min  -  hi). 

In  this  algorithm,  to  reduce  the  approximation  error,  the 
maximum  positive  error  maxfg  and  the  maximum  negative 
error  minjg  are  equalized  by  a  vertical  shift  of  g(X .  Y )  with 
correction  value  v  =  ( ma.Xfg  +  minjg) /2.  Thus,  the  approx¬ 
imation  error  is  ( maxjg  —  minjg ) / 2,  and  the  approximating 
polynomial  is  g(X,  T)  +  v. 

For  each  segment  {[BX,EX),  [By,Ey)},  since  Bx  <  X  <  Ex 
and  By  <  Y  <  Ey  hold,  we  can  offset  X  and  Y  by  Bx  and  By 
to  compute  the  approximating  polynomial  g(X.  Y )  +  v.  By 
using  the  offset  inputs  (Z  —  Bx)  and  [Y  —  By)  instead  of  X 
and  T,  we  reduce  the  size  of  multipliers  needed  to  compute 
g(X.  Y)  +  v.  By  substituting  X  —  Bx 4  Bx  and  Y  —  By  +  By  for 
X  and  Y  respectively,  we  transform  g(X,Y)  +  v  as  follows: 

s(X,Y)  +  v  =  Cxy(X  —  Bx  +  BX)(Y  —  By  +By)  +Cx(X  —  Bx  +  BX) 
+Cy(y  —  By  ~\~By )  +C()  +V 

=  Cxy(X  -  Bx)  ( Y  -By)  +  (Cx  +  CxyBy )  [X  -  Bx) 

+  (Cy  +  CxyBx)(Y  —  Bv)  +  CxyBxBv  +CxBx  +  CyBv  +Cq  +  v 

=  Cxy(X  - BX)(Y  - By)  +Cx{X  - Bx)  +Cy(Y -By)  +  Co,  (1) 

where  Cx  =  Cx  +  CxxBy,  Cy  =  Cy  +  CXVBX,  and  C0  = 
CxyBxBy  +  CXBX  +  CyBy  +  Co  +  v. 

3.  ARCHITECTURE  BASED  ON  BILINEAR 
INTERPOLATION 

Fig.  1  shows  architectures  for  two-variable  NFGs  realizing 
(1).  The  Segment  Index  Encoder  converts  values  of  X  and 
Y  into  a  segment  number.  This,  in  turn,  is  applied  as  the 
address  input  of  the  Coefficients  Memory.  The  coefficients 
are  applied  to  adders  and  multipliers  to  form  the  polynomial 
value  g(X,Y)  +  v.  Note  the  use  of  bitwise  ANDs  in  Fig.  1  to 
compute  X  —  Bx  and  Y  —  By.  In  recursive  segmentation,  we 
can  realize  X  —  Bx  and  Y  —  By  using  AND  gates  driven  on 
one  side  by  Bx  and  By,  respectively  [8]. 

The  segment  index  encoder  realizes  the  segment  index 
function:  {0,1}"  x  {0,1}"  — >  {0,  1}  shown  in 


x  Y  x  Y  x  Y 


g(X.Y)+v  g(X.Y)+v 


(a)  For  general  functions,  (b)  For  symmetric  functions. 

Fig.  1.  Architectures  for  two-variable  NFGs  based  on  bilin¬ 
ear  interpolation. 


(a)  Segment  index  function.  ^ 

(b)  LUT  cascade  and 
adders  [8], 


Fig.  2.  Segment  index  encoder. 

Fig.  2(a),  where  X  and  Y  have  n  bits,  and  k  denotes  the 
number  of  segments.  We  realize  this  function  with  the  ar¬ 
chitecture  shown  in  Fig.  2(b).  We  use  an  edge-valued  BDD 
(EVBDD)  [5]  to  design  this  architecture.  In  this  architec¬ 
ture,  the  interconnecting  lines  between  adjacent  LUT  mem¬ 
ories  determine  the  position  in  the  EVBDD  (labeled  rails), 
and  the  outputs  from  each  LUT  memory  to  the  adders  tally 
the  function  value  (labeled  Arails).  Consider  the  design  of 
the  LUT  cascade  and  adders  in  Fig.  2(b),  given  the  segmen¬ 
tation  produced  by  the  algorithm  in  [9]. 

We  begin  by  representing  the  segment  index  function  us¬ 
ing  a  multi-terminal  BDD  (MTBDD)  [1],  Then,  we  con¬ 
vert  the  MTBDD  into  an  EVBDD.  By  decomposing  the 
EVBDD,  we  obtain  the  architecture  in  Fig.  2(b).  For  more 
detail  on  this  architecture,  see  [8]. 

In  our  architecture,  the  coefficients  memory  and  the  LUT 
memories  of  the  segment  index  encoder  are  implemented  by 
RAMs.  Thus,  by  changing  the  data  for  the  coefficients  mem¬ 
ory  and  the  LUT  memories,  a  wide  class  of  two-variable 
functions  can  be  realized  by  a  single  architecture. 

4.  DESIGN  METHOD  FOR  SYMMETRIC 
FUNCTIONS 

Definition  1  A  two-variable  function  f(X,Y)  is  symmetric 
iff(X,Y)=f(Y,X). 


Table  1.  Number  of  segments  needed  for  three  design  meth¬ 
ods. _ 


Function 

f(*,Y) 

Domain 

X  and  Y\  12-bit  accuracy 
(Approximation  error:  2-14) 

X 

Y 

Tangent 

Bilinear 

Sym. 

sin(7tX)-\/F 

sin(TtAT) 

X4Y5 

i  Nx2  +  y2 

XY/dX2  +  Y2 
WaveRings 
Sombrero 
Gaussian 
fX2  +  Y2 
vW'+F3 

0,1 

0,1 

[0,1 

(0,1 

(0,1 

[0,7t 

(0,8 

(0,1 

(0,1 

(0,1 

(0,1 
0,1 
to,  1 
(0,1 
(0,1 
[0,71 
(0,8 
(0,1 
(0,1 
(0,1 

38,773 

26,122 

8,179 

173,552 

6,523 

28,377 

32,371 

141,113 

6,160 

6,790 

29,875 

8,389 

3,592 

103,046 

4,114 

16,278 

18,664 

86,633 

4,093 

3,955 

N/A 

4,232 

N/A 

51,687 

2,104 

8,202 

9,398 

N/A 

2,083 

2,027 

Sym.:  Two  symmetric  segments  are  counted  as  one  segment. 


Symmetric  functions  are  commonly  found  in  practical  ap¬ 
plications  of  NFGs.  For  example,  \/X2  +  Y2,  which  is  used 
in  converting  from  rectangular  to  polar  coordinates,  is  sym¬ 
metric.  This  section  presents  a  design  method  and  an  archi¬ 
tecture  taking  advantage  of  the  function’s  symmetry. 

Definition  2  A  segmentation  is  symmetric  if  for  every  seg¬ 
ment  {[Bxi,Ex\),[Byi,Ey\)}  such  that  Bx\  ^  By \  or  Ex\  ^ 
Ey i,  there  is  another  segment  {[Bx2,Ex2),[By2,Ey2)}  such 
that  Bx i  =  By 2,  Ex i  =  Ey2,  By\  =  Bx2,  and  Ey\  =  Ex2.  Sym¬ 
metric  segments  are  a  pair  of  such  segments. 

Lemma  1  Let  f(X.  Y )  be  a  symmetric  function,  and  let 
gl(X,L)  and  g2(X,Y)  be  bilinear  interpolations  of  f{X,Y) 
for  symmetric  segments.  Then ,  gi(X,L)  =  g2(Y,X). 

Theorem  1  The  segmentation  of  a  symmetric  function  pro¬ 
duced  by  the  recursive  planar  segmentation  algorithm  is 
symmetric. 

From  Lemma  1  and  Theorem  1,  we  can  use  only  one 
bilinear  interpolation  to  approximate  the  given  symmetric 
function  in  symmetric  segments.  By  assigning  the  same  seg¬ 
ment  index  to  symmetric  segments,  we  can  reduce  the  size 
of  coefficients  memory  by  nearly  half. 

Fig.  1(b)  shows  an  architecture  for  symmetric  functions. 
Here,  the  coefficients  memory  stores  only  data  for  segments 
such  that  X  <  Y.  For  other  segments,  approximated  val¬ 
ues  are  computed  using  Lemma  1.  Since  the  comparator 
and  multiplexers  operate  in  parallel  with  the  segment  index 
encoder,  there  is  no  speed  penalty  due  to  these  additional 
circuits. 

5.  EXPERIMENTAL  RESULTS 


5.1.  Number  of  Segments  and  Memory  Sizes 


Table  1  compares  the  number  of  segments  needed  for  the 
bilinear  interpolation  method  with  that  for  the  tangent  plane 
approximation1  [9]  for  various  functions.  For  those  func¬ 
tions  that  are  symmetric.  Table  1  shows  the  number  of  sym¬ 
metric  segments.  In  this  table,  WaveRings,  Sombrero,  and 
Gaussian  are: 


WaveRings 


cos  (Vx2  +  F2) 
fX2  +  Y2  +  0.25 


in  the  tangent  plane  approximation,  the  function  is  realized  using 
g(X,Y)=CxX  +  CyY  +  C0. 


Table  2.  Total  memory  sizes  needed  for  12-bit  accuracy 
NFGs  based  on  three  design  methods. 


fiX.Y ) 

Tangent 

Bilinear 

Sym. 

sin(nX)\/F 

sin(jiXF) 

X4Y5 

l/VW+Y7 

xy/Vx2+y2 

WaveRings 

Sombrero 

Gaussian 

Vx2  +  Y 2 
VWTW 

2,030,356 

1,313,684 

516,230 

13,054,030 

369,189 

1,886,924 

2,051,615 

11,345,482 

316,128 

405,576 

2,167,788* 

701,311 

266,644 

9,412,758 

293,330 

1,559,560 

1,487,068 

8,285.179 

287,868 

294,328 

N/A 

356,164 

N/A 

4,698,276 

153,176 

797,279 

757,238 

N/A 

153,291 

154,309 

‘  Bilinear  is  worse  than  tangent  for  only  1  case. 


sin  (fX2  +  Y2\  j  _ x2 

Sombrero  =  - \  — —  Gaussian  =  — r=e  w1 

VX2  +  Y2  Ysfffi 

Table  1  shows  that,  for  all  functions,  the  bilinear  in¬ 
terpolation  method  requires  fewer  segments  than  the  tan¬ 
gent  plane  approximation.  And,  the  number  of  symmet¬ 
ric  segments  is  much  smaller.  Thus,  the  bilinear  interpo¬ 
lation  method  and  the  symmetric  technique  significantly  re¬ 
duce  the  number  of  words  in  the  coefficients  memory.  For 
sin(jtX)-\/F,  the  bilinear  interpolation  method  is  also  supe¬ 
rior  to  the  tangent  plane  method.  However,  the  number  of 
segments  needed  in  the  bilinear  interpolation  method  is  only 
slightly  smaller. 

Table  2  compares  the  total  memory  sizes  needed  for 
NFGs  based  on  the  three  design  methods.  Note  that  our 
NFGs  have  two  kinds  of  memories:  coefficients  memory 
and  LUT  memory;  the  memory  size  shown  is  the  sum  of 
the  two. 

Table  2  shows  that,  for  all  functions  except  for 
sin(7tX)v/F,  the  bilinear  method  uses  less  memory  than  the 
tangent  plane  approximation,  even  though  the  bilinear  in¬ 
terpolation  requires  more  polynomial  coefficients.  That  is, 
for  many  functions,  the  reduction  in  the  number  of  segments 
due  to  bilinear  interpolation  is  sufficient  to  compensate  for 
the  increase  of  polynomial  coefficients.  Especially  for  sym¬ 
metric  functions,  using  the  symmetric  technique  shown  in 
Section  4  reduces  further  the  memory  size. 

5.2.  FPGA  Implementation  Results 

We  implemented  12-bit  accuracy  NFGs  based  on  the  three 
design  methods  using  the  Altera  Stratix  III  FPGA.  Since  the 
FPGA  has  adaptive  look-up  tables  (ALUTs)  that  can  real¬ 
ize  fast  adders,  synchronous  memory  blocks,  and  dedicated 
DSPs  (multipliers),  our  NFGs  are  efficiently  implemented 
by  those  hardware  resources  in  the  FPGA.  Table  3  compares 
the  FPGA  implementation  results  for  12-bit  accuracy.  In  this 
table,  the  columns  “Delay”  show  the  total  delay  time  of  each 
NFG  from  the  input  to  the  output,  in  nanoseconds. 

The  NFGs  based  on  tangent  plane  approximation  are 
faster  because  they  require  fewer  multipliers  and  fewer  poly¬ 
nomial  coefficients.  However,  for  the  1  / \/X2  +  Y2  and 
Gaussian  functions,  the  memory  needed  for  NFGs  based 
on  tangent  plane  approximation  is  so  large  that  they  could 
not  be  implemented  in  the  FPGA.  On  the  other  hand,  NFGs 
based  on  the  bilinear  method  require  less  memory,  and  their 
speed  is  comparable  to  the  speed  of  the  NFGs  based  on  tan¬ 
gent  plane  approximation.  Since  the  symmetric  technique 
significantly  reduces  the  memory  size,  it  is  easier  to  im¬ 
plement  with  an  FPGA.  Table  3  shows  that  the  symmetric 


Table  3.  FPGA  implementation  of  12-bit  accuracy  NFGs  based  on  three  design  methods. 


FPGA  device:  Altera  Stradx  III  (EP3SL340F1517C4) 

Logic  synthesis  tool:  Synplify  Pro  Ver.  8.8 

Function 

f(X,Y) 

Tangent  plane 

Bilinear  interpolation 

Symmetric  method 

#ALUTs 

#DSPs 

Freq. 

[MHz] 

#stages 

Delay 

[ns] 

#ALUTs 

#DSPs 

Freq. 

[MHz] 

#stages 

Delay 

[ns] 

#ALUTs 

#DSPs 

Freq. 

[MHz] 

#stages 

Delay 

[ns] 

sin(7tX)v/F 

270 

4 

243 

12 

49 

394 

6 

262 

15 

57 

N/A 

N/A 

N/A 

N/A 

N/A 

sin(nXF) 

131 

4 

285 

7 

25 

274 

14 

245 

9 

37 

351 

14 

245 

10 

41 

X4Y 5 

235 

4 

243 

10 

41 

286 

14 

251 

10 

40 

N/A 

N/A 

N/A 

N/A 

N/A 

1/UX2  +  F2 

- 

- 

- 

17 

- 

592 

7 

249 

18 

72 

543 

7 

252 

16 

64 

XY/s/X2  +  Y2 

182 

4 

285 

10 

35 

206 

7 

262 

10 

38 

293 

7 

262 

11 

42 

WaveRings 

341 

4 

243 

12 

49 

474 

14 

262 

13 

50 

489 

14 

256 

13 

51 

Sombrero 

221 

4 

243 

10 

41 

281 

14 

245 

11 

45 

296 

14 

245 

11 

45 

Gaussian 

- 

- 

- 

17 

- 

639 

14 

252 

17 

67 

N/A 

N/A 

N/A 

N/A 

N/A 

147 

4 

285 

8 

28 

195 

7 

262 

38 

256 

7 

253 

11 

43 

193 

4 

285 

10 

35 

201 

13 

272 

37 

242 

12 

244 

10 

41 

NFGs  cannot  be  mapped  into  the  FPGA  due  to  insufficient  memory  blocks. 


#ALUTs:  Number  of  ALUTs.  #DSPs:  Number  of  9-bit  X  9-bit  DSP  units.  Freq.  :  Operating  frequency.  #stages  :  Number  of  pipeline  stages. 


Table  4.  FPGA  implementation  of  various  NFGs  for 

xy/Vx2  +  y2. _ 


FPGA  device: 

Logic  synthesis  tool: 

Altera  Stratix  III  (EP3SL340F1517C4) 
Synplify  Pro  Ver.  8.8 

Memory 

#ALUTs 

#DSPs 

Freq. 

#stages 

Delay 

NFGs 

[bits] 

[MHz] 

[nsec.] 

2-bit  accuracv 

One-variable 

269,136 

273 

16 

195 

14 

72 

Tangent 

369,189 

182 

4 

285 

10 

35 

Bilinear 

293,330 

206 

7 

262 

10 

38 

Symmetric 

153,176 

293 

7 

262 

11 

42 

technique  has  some  speed  penalty.  However,  it  is  reason¬ 
able.  To  show  this,  we  designed  XY / \JX2  +  Y2  using  one- 
variable  NFG  for  1  /  \/X ,  two  squaring  circuits,  an  adder,  and 
two  multipliers.  The  one- variable  NFG  was  realized  by  the 
method  shown  in  [8],  which  is  based  on  linear  approxima¬ 
tion  and  non-uniform  segment  lengths.  Table  4  compares 
the  results  with  our  NFGs. 

Our  NFGs  require  fewer  ALUTs  and  DSPs  than  the  one- 
variable  implementation,  and  have  much  shorter  delay.  Es¬ 
pecially,  the  NFG  designed  by  the  symmetric  method  re¬ 
quires  less  memory  and  shorter  delay  than  the  one-variable 
NFG.  This  shows  that  the  speed  penalty  of  our  methods  is 
small. 

6.  CONCLUDING  REMARKS 

We  have  proposed  a  design  method  and  a  programmable 
architecture  for  two-variable  numerical  function  generators 
using  the  bilinear  interpolation.  To  realize  a  two-variable 
function  in  hardware,  we  partition  the  given  domain  of  the 
function  into  segments,  and  approximate  the  given  function 
using  the  bilinear  interpolation  in  each  segment.  In  this  pa¬ 
per,  we  also  presented  a  design  method  and  an  architecture 
for  two-variable  symmetric  functions.  Experimental  results 
show  that  the  proposed  method  can  significantly  reduce  the 
memory  size  needed  for  two-variable  functions  with  small 
speed  penalty. 
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